The goal of this document is guide the development of habitat suitability analyses including (1) explore the distributions of and correlations between key variables, (2) test analysis methods to understand relationships
Feather River will be used as a case study location with the goal of creating a more streamlined workflow that could be applied elsewhere. There are two relevant datasets (1) mini snorkel data, and (2) the intermediate level ongoing snorkel survey. These data are utilizing different methodologies and part of these analyses will explore the differences between these two data collection methods and resulting habitat.
This markdown focuses on cluster analysis.
Description of sampling
Key variables
Data processing considerations
Chinook, steelhead, tule perch, and speckeled dace are observed
fork length for chinook
Most Chinook observations are for fry (~40mm)
fork length for steelhead
Most steelhead observations are for smaller fish but there is some variation.
depth of microhabitat
depth of fish observation
velocity of microhabitat
velocity of fish observations
count of all species
count by species
Most observations are of Chinook salmon
count of chinook
count of steelhead (wild)
count of steelhead (clipped)
summary of percent cover by cover type where cover > 0% and Chinook salmon observed
summary of percent cover by cover type where cover > 0% and Steelhead observed
The following plots summarize percent cover by type and transect code to help describe the types of habitats surveyed. This information needs to be summarized better.
small woody cover
large woody cover
submerged vegetation
undercut bank
half meter overhead
more than half meter overhead
We transformed cover to presence/absence. If any of the cover types are > 20% then cover is present.
instream cover presence (1) and absence (0) for Chinook salmon
Instream cover means any of the instream cover types greater than 20%
overhead cover presence (1) and absence (0) for Chinook salmon
Overhead cover means any of the overhead cover types greater than 20%
instream cover presence (1) and absence (0) for Steelhead
overhead cover presence (1) and absence (0) for Steelhead
Checked the correlations between cover and substrate and did not find any highly correlated.
No correlations between distance to bottom and velocty.
Percent no cover in channel is highly inversely correlated with submerged aquatic vegetation.
Percent no cover overhead is highly inversely correlated with cover overhead.
None of the percent cover substrate variables are correlated.
The goal of the cluster analysis is to identify groupings of fish observations and what best describes those groupings. Unlike typical habitat analysis this will just focus on characteristics of fish observations.
TODO
Key variables
Data inputs
I tried the cluster analysis multiple times with slight variations in the data used. These are the data ultimately included:
Notes - if we add count or fl_mm to the above results we end up with one really big cluster and 2 smaller ones - if we remove the substrate variables and add count we end up with one really big cluster and 2 smaller ones - if we do not filter data to only fish observations and include count we end up with 4 clusters where 3 are large (~1500) and one is about 500. This may be worth digging into a little further. - was thinking that maybe we would see a fl_mm effect but i think there are too few large fish. decided to filter out large fish as this is not the target of the study and therefore may be leading to erroneous results.
Dendrogram of clusters
The dendrogram visualizes the connections between datapoints. Need to clean this plot up.
Scree plot
The elbow of the scree plot visualizes the ideal number of clusters.
## Height JoinsThis WithThis
## [359,] 207.2381 316 349
## [360,] 210.9009 326 357
## [361,] 224.6831 351 355
## [362,] 259.7803 322 353
## [363,] 262.5066 345 350
## [364,] 288.6768 360 363
## [365,] 301.4911 344 359
## [366,] 354.1795 362 365
## [367,] 385.4339 358 361
## [368,] 510.2824 366 367
## [369,] 683.1727 364 368
## [370,] 695.0076 356 369
Number of clusters
Based on analysis of multiple indices, 3-4 clusters is the best fit. Note that this could be looked into further to confirm this is the best number of clusters. I decided to go with 4 clusters based on the results of the scree plot.
The 4 clusters have very similar numbers (and sizes) of fish observations across a similar distribution of months. The species observed are similar though the low velocity and high aquatic vegetation cluster has more steelhead. The defining characteristics include the following:
The following includes a number of plots that could be used to visualize these results
There are small differences in local depth and velocity. There is a trend that can be observed but these results may not be significant.
There very clear differences in percent of small gravel, large gravel, and cobble between groups. There are smaller differences in percent boulder.
There are very clear differences in percent submerged aquatic vegetation and percent cover overhead. There are smaller differences in percent small woody cover. There are small differences in percent large woody cover and undercut banks.
This is currently in progress because there is still some data processing needed
Only Chinook are observed
TODO check that these were not filtered from dataset
## [1] NA "chinook" "unknown"
fork length for chinook
Most Chinook observations are for fry (~40mm) though there is some variation
depth
Instream cover codes
Overhead cover codes
## [1] NA "A" "BDEF" "FE" "D" "BCDE" "B" "C" "EC"
## [10] "BE" "BCDEF" "BDF" "BD" "E" "EF" "F" "BEF" "BCE"
## [19] "BCEF" "EB" "CE" "BCD" "CDF" "BDE" "BF" "CEF" "BED"
## [28] "CF" "EFD" "AE" "BC" "BEC" "DE" "CDEF" "BEFD" "CD"
## [37] "ECF" "BCF" "BCDF" "CDE" "CED" "DF" "AG" "ABD" "AD"
## [46] "FB" "AB" "bdf"
## # A tibble: 1 × 1
## n
## <int>
## 1 2061
## [1] NA 1 2 5 4 3 6
## # A tibble: 1 × 1
## n
## <int>
## 1 2062